{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "62129e1a-e9de-454e-a714-35ccbcf0b518",
   "metadata": {},
   "outputs": [],
   "source": [
    "#OK\n",
    "import functools as ft\n",
    "import re\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pylab as plt\n",
    "import openai\n",
    "\n",
    "import outlines"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "b20aafe8-b7a3-4df4-878f-b48b74e131df",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "env: OPENAI_API_KEY=# you key here\n"
     ]
    }
   ],
   "source": [
    "%env OPENAI_API_KEY= # you key here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a3514d6-d5d7-46e9-9b69-1251d337e094",
   "metadata": {},
   "source": [
    "In this example we will look at completion results for questions similar to those in the GSM8K dataset, using few-shots prompts with 5 examples. We first use `outlines.Template` to build the few-shot prompt. Outlines uses the Jinja2 templating engine to render the object when the function is called with the variables' values; it thus allows you to build complex prompts very easily."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ffe8bb11-6b51-4fe7-bfb3-c62556a60db8",
   "metadata": {},
   "outputs": [],
   "source": [
    "examples = [\n",
    "    {\n",
    "        \"question\": \"There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?\",\n",
    "        \"answer\": \"We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted. So, they must have planted 21 - 15 = 6 trees. The answer is 6.\",\n",
    "    },\n",
    "    {\n",
    "        \"question\": \"If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?\",\n",
    "        \"answer\": \"There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.\",\n",
    "    },\n",
    "    {\n",
    "        \"question\": \"Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?\",\n",
    "        \"answer\": \"Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74 chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.\",\n",
    "    },\n",
    "    {\n",
    "        \"question\": \"Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?\",\n",
    "        \"answer\": \"Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.\",\n",
    "    },\n",
    "    {\n",
    "        \"question\": \"Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now?\",\n",
    "        \"answer\": \"He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so in total he has 7 + 2 = 9 toys. The answer is 9.\",\n",
    "    },\n",
    "    {\n",
    "        \"question\": \"There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?\",\n",
    "        \"answer\": \"There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 = 20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers. The answer is 29.\",\n",
    "    },\n",
    "    {\n",
    "        \"question\": \"Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many golf balls did he have at the end of wednesday?\",\n",
    "        \"answer\": \"Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.\",\n",
    "    },\n",
    "    {\n",
    "        \"question\": \"Olivia has $23. She bought five bagels for $3 each. How much money does she have left?\",\n",
    "        \"answer\": \"She bought 5 bagels for $3 each. This means she spent 5\",\n",
    "    },\n",
    "]\n",
    "\n",
    "\n",
    "few_shot_prompt = outlines.Template.from_string(\n",
    "    \"\"\"\n",
    "    {% for example in examples %}\n",
    "    Q: {{ example.question }}\n",
    "    A: {{ example.answer }}\n",
    "    {% endfor %}\n",
    "    Q: {{ question }}\n",
    "    A:\n",
    "    \"\"\"\n",
    ")\n",
    "\n",
    "# Template instances can be partially evaluated because they are callable objects\n",
    "gsm8k_prompt = ft.partial(few_shot_prompt, examples=examples)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1eae0ec8-89f0-43fc-b055-6fcd64cbc03b",
   "metadata": {},
   "source": [
    "## When `gpt-4o-mini` is uncertain"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a273ed78-e813-467e-85f3-16d7f283ba87",
   "metadata": {},
   "source": [
    "Let us now sample 20 completions with the `gpt-4o-mini` model. Outlines is sampling first, and allows to draw several samples with both OpenAI and `transformers` models easily:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "beff960d-6833-4f24-af09-5b65886a9549",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = outlines.from_openai(openai.OpenAI(), \"gpt-4o\")\n",
    "\n",
    "question = \"When I was 6, my sister was half the age of my brother. When I was 14, my sister was 3 years younger than my brother. Now I'm 70, how old is my sister now?\"\n",
    "prompt = gsm8k_prompt(question=question)\n",
    "answers = model(prompt, n=20, max_tokens=512)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a895b6d-d4d4-40f9-9156-24ba7e21cc08",
   "metadata": {},
   "source": [
    "The correct answer to this question is 67. Let us now count the different answers, and take a look at their distribution. Let us first define a few utility functions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f1c83d1f-a478-4509-890e-b84a2e0d8846",
   "metadata": {},
   "outputs": [],
   "source": [
    "def count_digits(answers):\n",
    "    digits = []\n",
    "    for answer in answers:\n",
    "        try:\n",
    "            match = re.findall(r\"\\d+\", answer)[-1]\n",
    "            if match is not None:\n",
    "                digit = int(match)\n",
    "                digits.append(digit)\n",
    "        except AttributeError:\n",
    "            print(f\"Could not parse the completion: '{answer}'\")\n",
    "\n",
    "    unique_digits, counts = np.unique(digits, return_counts=True)\n",
    "    return {d: c for d, c in zip(unique_digits, counts)}\n",
    "\n",
    "\n",
    "def plot_counts(counts):\n",
    "    fig = plt.figure(figsize=(12, 8))\n",
    "    ax = fig.add_subplot(111)\n",
    "\n",
    "    bar = ax.bar(counts.keys(), counts.values())\n",
    "    ax.spines[[\"right\", \"top\", \"left\"]].set_visible(False)\n",
    "    ax.get_yaxis().set_visible(False)\n",
    "    ax.get_yaxis().set_visible(False)\n",
    "\n",
    "    for rect in bar:\n",
    "        height = rect.get_height()\n",
    "        plt.text(\n",
    "            rect.get_x() + rect.get_width() / 2.0,\n",
    "            height,\n",
    "            f\"{height:.0f}\",\n",
    "            ha=\"center\",\n",
    "            va=\"bottom\",\n",
    "            fontsize=20,\n",
    "        )\n",
    "\n",
    "    ax.set_xticks(list(counts.keys()))\n",
    "    ax.set_xlabel(\"Answer\")\n",
    "\n",
    "\n",
    "def entropy(counts):\n",
    "    counts = np.array(list(counts.values()))\n",
    "    probs = counts / np.sum(counts)\n",
    "    log_probs = np.log(probs)\n",
    "    return -np.sum(probs * log_probs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "88668e09-bcd6-4a6a-83a5-838189b910eb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqsAAAHgCAYAAACCbCTDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVwklEQVR4nO3df7DldX3f8dcbFtmVLb9cDDgLrB0DjNWUlWQzWBaDtEJg0BBiJk5VsKLFKRSsU7u2M8wKZouDjsg4Y0chBn9MTfgh3QkapQiCHSJWFigBAlMgggUkJlVJDXXh0z/2ILuwF9LZe+95372Px8yZvff7PeznvcB893m/53vOt8YYAQCAjnaZ9gAAADATsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtLXkRfb7XCsAAOZDbW+jM6sAALQlVgEAaEusAgvGddddl5NPPjn7779/dt9997ziFa/Icccdl69+9avTHg3YiTn2TNeLXbMK0MIHP/jBXHjhhVm5cmXe/OY3Z8WKFXn88cfzve99LzfccENOOOGEaY8I7IQce6avxnjB91B5gxUwdZ/97Gfz3ve+N6eeemo+85nP5CUveck2+3/+859nt912m9J0wM7KsWfebfcNVmIVaO3JJ5/MgQcemGXLluW+++573l8WAHPBsWcqthurLgMAWrv22mvz+OOP55xzzskuu+ySa665JnfeeWeWLl2aNWvW5Mgjj5z2iMBOyLGnD7EKtPbd7343SbJ06dKsXr06d9555zb7jz766FxxxRXZb7/9pjEesJNy7OnDpwEArf3whz9Mklx44YWpqtx000356U9/mjvuuCNvetObcuONN+atb33rlKcEdjaOPX2IVaC1p59+OkmyZMmSbNy4MUcddVSWL1+e1772tfnKV76SlStX5lvf+lZuvvnmKU8K7Ewce/oQq0Bre++9d5Jk9erVWbVq1Tb7XvrSl+a4445Lktxyyy3zPBmwM3Ps6UOsAq0deuihSZ79i+O59tlnnyTJz372s/kaCVgEHHv6EKtAa8cee2yqKnfdddcvXpbb2jNvenjlK18536MBOzHHnj7EKtDawQcfnJNOOinf//7388lPfnKbfd/4xjfy9a9/PXvvvXeOP/74KU0I7Iwce/pwUwCgvYcffjivf/3r89BDD+XYY4/N6tWr88ADD+Tqq69OVeXLX/5yTjnllGmPCexkHHvmnTtYAQvX448/nvPOOy8bN27MI488kj333DNr167Nhz70oaxZs2ba4wE7KceeeSVWAQBoa7ux6ppVAADaEqsAALQlVgEAaGvJtAcAeDGr1l0z474HLzhxHicBFhPHnh6cWQUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCAAvaF7/4xVRVqiqXXHLJtMdhlolVAGDBeuihh3LmmWdm+fLl0x6FOSJWAYAFaYyRd73rXXnZy16WM844Y9rjMEfEKgCwIF188cX55je/mc997nPZY489pj0Oc0SsAgALzt13351169bl7LPPztFHHz3tcZhDYhUAWFA2b96cd7zjHTnooIOyYcOGaY/DHFsy7QEAAP5/nHfeedm0aVO+/e1vZ9myZdMehznmzCoAsGB85zvfyYYNG/KBD3wgRx555LTHYR6IVQBgQdi8eXPe+c535pBDDsn5558/7XGYJ2IVAFgQnnjiidx77725++67s3Tp0l/cCKCq8uEPfzhJ8p73vCdVlXPOOWe6wzJrXLMKACwIu+++e9797ndvd9+tt96aTZs25aijjsqhhx7qEoGdiFgFABaEZcuWzXg71fXr12fTpk059dRTc/rpp8/zZMwllwEAANCWWAUAoC2xCgAseOvXr88YwyUAOyGxCgBAW2IVAIC2xCoAAG356CoAYMFYte6aF9z/4AUnztMkzBdnVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAAIvEj370o1xyySU5+eST86pXvSrLli3LXnvtlaOOOiqXXnppnn766WmP+DxLpj0AAADz4/LLL8/73ve+HHDAATnmmGNy0EEH5bHHHstVV12V008/PV/72tdy+eWXp6qmPeoviFUAgEXikEMOycaNG3PiiSdml12efYF9w4YNWbNmTa688spcddVVOeWUU6Y45bZcBgAAsEi88Y1vzEknnbRNqCbJ/vvvnzPOOCNJcsMNN0xhspmJVQAAsttuuyVJlizp9cK7WAUAWOQ2b96cz3/+80mS448/fsrTbEusAgAscuvWrcudd96ZE044Iccdd9y0x9mGWAUAWMQuvvjifPzjH89hhx2WL3zhC9Me53nEKgDAIvWpT30qZ599dl796lfn+uuvz7777jvtkZ5HrAIALEIXXXRRzjrrrLzmNa/J9ddfn/3333/aI22XWAUAWGQ++tGP5v3vf38OP/zwXH/99Xn5y18+7ZFmJFYBABaR888/P+vWrcsRRxyR6667LitWrJj2SC+o1wdpAQAwZy677LKce+652XXXXbN27dpcfPHFz3vOqlWrctppp83/cDMQqwAAi8QDDzyQJHnqqady0UUXbfc5b3jDG1rFqssAAAAWifXr12eM8YIPt1sFAIC/J7EKAEBbYhUAgLa8wQoAYBFZte6aGfc9eMGJ8zjJ348zqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoq2WsXnHFFTnrrLOydu3a7LnnnqmqvP3tb5/2WAAAc0L7zGzJtAfYno985CO5/fbbs3z58qxcuTL33HPPtEcCAJgz2mdmLc+sfuITn8i9996bn/zkJ/n0pz897XEAAOaU9plZyzOrxxxzzLRHAACYN9pnZi3PrAIAQCJWAQBoTKwCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2mp5U4Crr746V199dZLk0UcfTZLcfPPNOe2005IkK1asyMc+9rEpTQcAMLu0z8xaxuptt92Wyy67bJtt999/f+6///4kycEHH7xo/4MBADsf7TOzlpcBrF+/PmOMGR8PPvjgtEcEAJg12mdmLWMVAAASsQoAQGNiFQCAtlq+wSpJVq27ZsZ9D15w4jxOAgAwt3TPzJxZBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2hKrAAC0JVYBAGhLrAIA0JZYBQCgLbEKAEBbYhUAgLbEKgAAbYlVAADaEqsAALQlVgEAaEusAgDQllgFAKAtsQoAQFtiFQCAtsQqAABtiVUAANoSqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG3VGGPmnVV/mmTF/I0zoxVJ/mraQwBtOCYAz5jP44G15tZfjTGOf+7GF4zVLqrqv48xfnXacwA9OCYAz5jP44G1psNlAAAAtCVWAQBoa6HE6memPQDQimMC8Iz5PB5YawoWxDWrAAAsTgvlzCoAAItQ+1itqr2r6oqquqeq7q6qI6c9EzA/qmppVd1SVbdX1Z9X1Ycn2/+wqh6oqtsmj8OnPCowD16oCarqA1U1qmpWPnJze2tV1R9tddx5sKpum4V1Dt3q97ytqn5SVedU1b5VdW1V3Tf5dZ85XOv8qrpjsu0bVfWKHV1rNrW/DKCqLkty0xjjkqp6SZKXjjH+95THAuZBVVWSPcYYT1TVbkm+neTsJGck+ZMxxhVTHRCYVzM1QVUdmOSSJIclOWKMscOfGfpi/VFVH0/y4zHGeTu61la/565JfpDk15P8qyR/Pca4oKrWJdlnjPHv5mitvxlj/GSy/V8nefUY44zZWmtHtT6zWlV7JTk6yaVJMsb4v0IVFo+xxROTb3ebPHr/hA3MiRdpgk8k+WBm6fjwYv0x+UH6d5P859lYbyvHJvmfY4y/TPKWJJdNtl+W5Lfmaq1nQnVijzQ7zraO1SSvTPJ4ks9V1aaquqSq9pj2UMD8qapdJy+1/TDJtWOM70x2/f7kZatPVNXu05sQmCfbbYKqekuSH4wxbp/rtbbavzbJY2OM+2ZxzST5vTwbwL80xnhk8vWjSX5pDtdKVf1+VT2U5J8nOXeW19oh3WN1SZLXJfn0GGN1kr9Nsm66IwHzaYzx1Bjj8CQrk6ypqtck+VC2vNz3a0n2TTJrL40BbW2vCdYn+feZ/bh6sf54W2b5rOrkUoM3J7n8ufvGlms2Z+1s5/bWGmP8hzHGgUm+lOTM2VprNnSP1YeTPLzVmZQrsuV/HmCRmbwEd32S48cYj0wuEXgyyeeSrJnqcMB8mKkJXpnk9qp6MFt+qL21qvafo7VSVUuS/HaSP9rBNZ7rN5PcOsZ4bPL9Y1V1wGTNA7Ll1aW5WmtrX0pyyiyutcNax+oY49EkD1XVoZNNxya5a4ojAfOoqvarqr0nXy9L8s+S3LPVAbyy5TquO6c1IzA/ZmiCW8cYLx9jrBpjrMqWyHzd5LmzvdYz/fFPk9wzxnh4R9bYjueerd2Y5NTJ16cm+S9ztVZV/fJW+96S5J5ZXGuHLYRPAzg8W97h95Ik9yd51xjjb6Y6FDAvqupXsuWNBbtmyw/XfzzGOK+qvplkvySV5LYkZ2z1RixgJ/ViTTA5u/qrs/RpANtdq6r+MMmfjTH+046usdVaeyT5fpJ/OMb48WTby5L8cZKDkvxlkt8dY/z1HK11ZZJDkzw9WeuMMcYPdnSt2dI+VgEAWLxaXwYAAMDiJlYBAGhLrAIA0JZYBQCgLbEKAEBbYhXgOarqt6pqVNVh054FYLETqwDP97Yk3578OhWTu+QALHpiFWArVbU8yVFJ3p3k9ybbfqOqbqiqK6rqnqr60uTuWamqC6rqrqq6o6o+VlW7VtUDtcXeVfVUVR09ee6NVfXLVbVHVf1BVd1SVZuq6i2T/adV1cbJTQ+um86/AYBe/OQOsK23JPnTMca9VfWjqjpisn11kn+U5H8l+W9J/klV3Z3k5CSHjTFGVe09xniqqv4iyauz5Z7ltyZZW1XfSXLgGOO+qtqQ5JtjjH8xuZ3sLVX1XyfrvC7Jr8zGnWoAdgbOrAJs621Jvjz5+st59lKAW8YYD48xns6WW7yuSvLjJH+X5NKq+u0k/2fy3JuSHD15/MdsOVP7a0m+O9n/piTrquq2JDckWZott1RMkmuFKsCznFkFmKiqfZO8Mclrq2ok2TXJSHJNkie3eupTSZaMMTZX1Zokxyb5nSRnTv75G5O8L8krkpyb5N8m+Y1sidgkqSSnjDH+4jnr/3qSv52TPxzAAuXMKsCzfifJF8YYB48xVo0xDkzyQJK123vy5PrWvcYYX03y/iT/eLLrliSvT/L0GOPvsuVM7L/MlohNkq8nOWur615Xz9GfB2DBE6sAz3pbkq88Z9uVmflTAf5Bkj+pqjuy5dMD/k2SjDGeTPJQkj+bPO+myXP/x+T785PsluSOqvrzyfcAbEeNMaY9AwAAbJczqwAAtCVWAQBoS6wCANCWWAUAoC2xCgBAW2IVAIC2xCoAAG2JVQAA2vp/jdj4sUZoV2sAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 864x576 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "counts = count_digits(answers)\n",
    "plot_counts(counts)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "661a1135-ac2d-4a49-a786-d04a7ba68b48",
   "metadata": {},
   "source": [
    "We see that there is an important variabilty in the answers given by `gpt-4o-mini`. Depending on the number of samples taken, even self-consistency sampling may lead to the wrong result here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "30ea0dfe-6c15-44f0-881c-88b325542b44",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Entropy: 1.5741030017371853\n"
     ]
    }
   ],
   "source": [
    "print(f\"Entropy: {entropy(counts)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b15b230-b667-4c9c-8a5d-366dd61de9b7",
   "metadata": {},
   "source": [
    "## `gpt-4o-mini` on an easier question"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "beae30f0-4168-4a80-90d4-d26a4f476469",
   "metadata": {},
   "source": [
    "Let us now look at the results for an arguably easier question:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "7e106b94-2dfd-4a75-b4d9-b1ad693418a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"When I was 6 my sister was half my age. Now I’m 70 how old is my sister?\"\n",
    "prompt = gsm8k_prompt(question)\n",
    "answers = model(question, samples=20, max_tokens=512)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "dd46fb2b-08ef-4003-8d03-ea0f39c865c4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Entropy: 0.1985152433458726\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqsAAAHgCAYAAACCbCTDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQuklEQVR4nO3db6ye9V3H8c+PPxGxIjMVghSLkCYgOtmUzSCd6xii8mCIyxBCTBUTWbBmIzEhWeIAUfegSYkha6JIQsmSRaoclnUbmeCyQbrBoi2GrtSlEIZKxnQOCmPB9vLBuaH8aUui4dyfwuuVnPQ+13Wd3N/TR+/++ruva0zTFAAAaHTEvAcAAICDEasAANQSqwAA1BKrAADUEqsAANQSqwAA1Drqdc67rxUAAEthHOiglVUAAGqJVQAAaolVAIA3sc2bN2fdunVZvXp1jjvuuIwxcsUVVxz0+meeeSYf+9jHcsYZZ+SYY47J2972tlx44YW55557lnDq/V5vzyoAAIexG2+8Mdu3b8+yZcuyYsWK7Ny586DXfve73815552XHTt25KyzzspVV12VPXv25K677sr73//+3HLLLbnyyiuXcHorqwAAb2obNmzIrl278vTTT2fjxo2HvPa6667Ljh07cskll2Tbtm256aabcsstt+Thhx/OKaecknXr1uWJJ55YoskXiVUAgDexNWvWZNWqVRnjgB+2f4U777wzSXLDDTfkqKP2/wf8CSeckGuuuSbf//73c+utt75hsx6IWAUAIEny5JNPJklOO+2015x78dhS710VqwAAJEmWL1+eJHn00Udfc2737t1JkkceeWRJZxKrAAAkSS666KIkycc//vHs3bv3peNPPfVUNmzYkGTxQ1hLyd0AAABIsrhX9e67787mzZtz9tln5/zzz8+zzz6bu+66KyeffHIef/zxHHHE0q51WlkFACBJctJJJ+XBBx/M1VdfnWeeeSaf/OQns2XLllx66aW54447kix+2GopWVkFAOAlJ554Ym6++ebcfPPNrzh+7733JknOOeecJZ3HyioAAK9r06ZNSZLLL798Sd9XrAIAkCTZt29f9uzZ85rjt99+ezZt2pRzzz03F1988ZLOZBsAAMCb2MLCQhYWFpLsv4/q1q1bs3bt2iSLt6tav359kuS5557LiSeemAsuuCCnn356jjjiiNx///3ZunVrzjzzzNxxxx1L/gGrMU3Toc4f8iQAAN2uu+66XH/99Qc9v3Llyjz22GNJkhdeeCFXXXVV7rvvvpceq7pq1ap86EMfykc+8pEce+yxb+SoB3zEllgFAKDBAWPVnlUAAGqJVQAAaolVAABqiVUAAHLqtVvmPcIBiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABqiVUAAGrVxOrmzZuzbt26rF69Oscdd1zGGLniiivmPRYAAHN01LwHeNGNN96Y7du3Z9myZVmxYkV27tw575EAAJizmpXVDRs2ZNeuXXn66aezcePGeY8DAECBmpXVNWvWzHsEAADK1KysAgDAq4lVAABqiVUAAGqJVQAAaolVAABqiVUAAGqJVQAAaolVAABq1TwUYGFhIQsLC0mSJ598MkmydevWrF27NkmyfPnyrF+/fk7TAQAwDzWxum3bttx2222vOLZ79+7s3r07SbJy5UqxCgDwFjOmaTrU+UOeBADgzeHUa7fksU9cNM8RxoEO2rMKAEAtsQoAQC2xCgBArdpYPfXaLfMeAQCAOauNVQAAEKsAANQSqwAA1BKrAADUEqsAANQSqwAA1BKrAADUEqsAANQSqwAA1BKrAADUEqsAANQSqwAA1BKrAADUEqsAANQSqwAA1BKrAADUEqsAANQSqwAA1BKrAADUEqsAANQSqwAA1BrTNB385BhfSLJ86cZ5heVJvjOn9wYAeCuaZ399Z5qmX3v1wUPG6jyNMb4+TdMvznsOAIC3isb+sg0AAIBaYhUAgFrNsfpX8x4AAOAtpq6/avesAgBA88oqAABvcXWxOsY4ZozxwBhj+xjj4THG9fOeCQDgcHewxhqL/myMsWuM8Y0xxh+97PhfjjG+OcZ4aIzxznnMfdQ83vR1/CDJ+6Zp2jPGODrJfWOMz0/T9NV5DwYAcBg7YGMlOTPJKUnOmKZp3xjjhNn1v55k1ezr3Uk2zv5cUnWxOi1uot0z+/bo2ZeNtQAA/w+HaKwPJ7l8mqZ9s+u+PbvmA0k2zX7uq2OM48cYJ03T9B9LOXfdNoAkGWMcOcbYluTbSb44TdPX5jwSAMBh7yCNdXqSS8cYXx9jfH6MsWp2+clJvvWyH39idmxJVcbqNE17p2k6O8mKJO8aY/zsnEcCADjsHaSxfijJ87MnV/11klvnOOJrVMbqi6Zp+u8k/5jkNc+JBQDg/+ZVjfVEkr+fnbozydtnr/8ti3tZX7RidmxJ1cXqGOMnxhjHz17/cJILkuyc61AAAIe5QzTWQpI1s8t+Jcmu2evPJPmd2V0BfinJ95Z6v2pS+AGrJCcluW2McWQWY/pvp2n67JxnAgA43B2wscYY9yX51Bjjo1n8ANbvz67/XJLfSPLNJM8l+d05zOwJVgAA9KrbBgAAAC8SqwAA1BKrAADUEqsAANQSqwAA1BKrAK8yxrh4jDGNMc6Y9ywAb3ViFeC1Lkty3+zPuRhjNN4HG2DJiVWAlxljLEtyXpIrk/z27Nh7xxhfGmNsHmPsHGN8aowxZuc+McbYMcZ4aIyxfoxx5Bjj0dkTX44fY+wdY7xndu2Xxxirxhg/Msa4dYzxwBjjn8cYH5idXzvG+MwY494k98znbwCgi3+5A7zSB5J8YZqmXWOM/xxj/MLs+DuSnJXk35Pcn+SXxxjfSPKbSc6YpmkaYxw/TdPeMcYjSX4myU8n+ackq8cYX0tyyjRN/zrG+PMk907T9HuzRx8+MMb4h9n7vDPJ26dp+q+l+oUBmllZBXily5J8evb609m/FeCBaZqemKZpX5JtSU5N8r0kzyf5mzHGJVl8HGGSfCXJe2Zff5HFldpzkjw4O/+rSa4dY2xL8qUkxyT5qdm5LwpVgP2srALMjDF+PMn7kvzcGGNKcmSSKcmWJD942aV7kxw1TdP/jDHeleT8JB9M8oezn/9ykg8n+ckkf5Lkj5O8N4sRmyQjyW9N0/TIq97/3UmefUN+OYDDlJVVgP0+mOT2aZpWTtN06jRNpyR5NMnqA10829/6Y9M0fS7JR5P8/OzUA0nOTbJvmqbns7gS+wdZjNgkuTvJupfte33HG/T7ABz2xCrAfpclufNVx/4uB78rwI8m+ewY46Es3j3gmiSZpukHSb6V5Kuz674yu/ZfZt//aZKjkzw0xnh49j0ABzCmaZr3DAAAcEBWVgEAqCVWAQCoJVYBAKglVgEAqCVWAQCoJVYBAKglVgEAqCVWAQCo9b+lzUDoz9UHogAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 864x576 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "counts = count_digits(answers)\n",
    "plot_counts(counts)\n",
    "print(f\"Entropy: {entropy(counts)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf4cacdf-a31d-43bd-8517-eec9f656eee4",
   "metadata": {},
   "source": [
    "The entropy of the results is much lower, we say that the model is more \"certain\" of its answers. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22f31872-aab7-4a68-b9f2-d335a4f1a875",
   "metadata": {},
   "source": [
    "## How `gpt-4` compares to `gpt-4o-mini`\n",
    "\n",
    "Let us now look at how GPT4 fares on the original question:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2d5ab5b8-eca5-47f5-a35c-5f3865e35755",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = outlines.from_openai(openai.OpenAI(), \"gpt-4\")\n",
    "\n",
    "question = \"When I was 6, my sister was half the age of my brother. When I was 14, my sister was 3 years younger than my brother. Now I'm 70, how old is my sister now?\"\n",
    "prompt = gsm8k_prompt(question)\n",
    "answers = model(prompt, samples=20, max_tokens=512)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "d316a5f7-cebc-4b09-9b1b-aee219b2f088",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Entropy: -0.0\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqwAAAHgCAYAAABgsD+6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQI0lEQVR4nO3dYahf9X3H8c+vua6J3Ui4hKKDOpVMxTmn1Tlwatt0rKKUziZgfJa5B2tg0TkY5EEZLepWYYITwTJ0o5Wi08ypXYdjMxG76OpkLc51jQUdU4ZiHaagREg8e5C/0WhiNWlyP7l9vSDk/s/v/Lnffx6Ed05+95wxTVMAAKDVhxZ6AAAAeC+CFQCAaoIVAIBqghUAgGqCFQCAaoIVAIBqcz9h3T2vAAA4EsaBFlxhBQCgmmAFAKCaYAV4H15++eXcdtttueyyy7Jq1aosW7Ysy5cvzwUXXJDbb789b7zxxn7f9+ijj+aSSy7J/Px8li1bljPPPDM33XRTdu/efYQ/AcDRa/yER7PawwqQ5Ktf/Wo2bNiQ448/Pp/61Kdywgkn5MUXX8y9996bHTt2ZM2aNbnnnnsyxltbsO6///6sWbMmS5cuzeWXX575+fl885vfzPbt27N27drcc889C/iJAOoccA+rYAV4H7Zs2ZJXX301l156aT70obf+c+qFF17Ieeedl+eeey6bN2/OmjVrkiQ//vGPs2rVquzYsSPbtm3LueeemyTZuXNnVq9encceeyx33nln1q1btyCfB6CQH7oCOBSrV6/OZz/72X1iNUmOO+64fOELX0iSPPzww3uPb968OS+99FLWrVu3N1aTZOnSpbnuuuuSJLfeeuvhHxxgERCsAIfomGOOSZLMzb11p8AtW7YkSS6++OJ3nX/RRRfl2GOPzaOPPprXX3/9yAwJcBQTrACHYNeuXfn617+eZN843b59e5LklFNOedd75ubmctJJJ2XXrl155plnjsygAEcxwQpwCDZt2pSnnnoql1xyST7zmc/sPb5jx44kyfLly/f7vjePv/LKK4d9RoCjnWAFOEg333xzbrzxxpx22mm54447FnocgEVLsAIchFtuuSVXX311Tj/99GzdujXz8/P7rL95BfXNK63v9ObxFStWHNY5ARYDwQrwAd10003ZuHFjzjjjjGzdujXHHXfcu8459dRTkyRPP/30u9Z27dqVZ599NnNzczn55JMP+7wARzvBCvAB3HDDDbnmmmty1llnZevWrfnoRz+63/NWr16dJHnwwQfftfbII4/ktddey/nnn58Pf/jDh3VegMVAsAK8T9dee202bdqUc845Jw899FBWrlx5wHPXrl2blStX5q677soTTzyx9/jOnTvzxS9+MUmyYcOGwz4zwGLgSVcA78PXvva1rF+/PkuWLMnGjRv3+9P/J554YtavX7/39X333Ze1a9dm6dKlWbduXebn5/PAAw/sfTTr3Xffvc+jXAF+xnk0K8Ch+NKXvpQvf/nL73nOJz7xiX2edpUk27Zty/XXX5/HHnssO3fuzKpVq3LllVfmqquuypIlSw7jxABHHcEKAEC1AwarPawAAFQTrAAAVBOsAABUm1voAQ7kxE3fWugRAAB+pvz3Vy5d6BH2yxVWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKqNaZoOvDjGg0lWHrlxABaFlUl+tNBDABxlfjRN08X7W3jPYAXggxtjPDFN07kLPQfAYmFLAAAA1QQrAADVBCvAT99fLvQAAIuJPawAAFRzhRUAgGpzCz0AwNFsjLEiyW1JzkgyJbkyyR8mOXV2yookr0zTdNaRnw5gcRCsAIfmL5I8OE3T2jHGzyU5dpqmy99cHGPcmGTHgk0HsAjYwwpwkMYYy5N8L8nJ037+Mh1jjCT/k2T1NE0/PMLjASwa9rACHLyTkryU5K/HGN8dY9w2xvjI29YvTPKiWAU4NIIV4ODNJfl4klunaTo7yatJNr1t/Yokdy7EYACLiWAFOHjPJ3l+mqbvzF5vzp6AzRhjLsnnk/zNAs0GsGgIVoCDNE3TC0meG2O8eUeATyf5/uzr30ryg2manl+Q4QAWEXcJADg0G5N8Y3aHgGeS/O7s+LrYDgDwU+EuAQAAVLMlAACAaoIVAIBqghUAgGqCFQCAaoIVAIBqghXgHcYYvzPGmMYYpy30LAAIVoD9uSLJv8x+XxCzJ2UBEMEKsI8xxs8nuSDJ72XPzf8zxvjkGOPhMcbmMcYPxhjfGGOM2dpXxhjfH2M8Ocb48zHGkjHGs2OPFWOM3WOMi2bnPjLG+OUxxkfGGH81xnh8jPHdMcbnZuvrxxgPjDG2JHloYf4EAPr4FzzAvj6X5MFpmp4eY7w8xjhndvzsJL+S5H+TbEvym2OM/0pyWZLTpmmaxhgrpmnaPcbYnuT0JCcl+fckF44xvpPkY9M0/XCM8adJtkzTdOUYY0WSx8cY/zz7Ph9PcuY0Tf93pD4wQDtXWAH2dUWSu2Zf35W3tgU8Pk3T89M0vZHke0lOTLIjyc4kt48xPp/ktdm5305y0ezXn2XPFdtfT/Jvs/XfTrJpjPG9JA8nWZrkhNnaP4lVgH25wgowM8aYT7I6ya+OMaYkS5JMSb6V5PW3nbo7ydw0TbvGGOcl+XSStUn+YPb+R5JsSPKLSf4kyR8n+WT2hGySjCRrpmna/o7v/xtJXj0sHw7gKOYKK8Bb1ia5Y5qmX5qm6cRpmj6W5NkkF+7v5Nl+1+XTNP1DkmuS/Nps6fEk5yd5Y5qmndlzRfb3sydkk+Qfk2x82z7Ysw/T5wFYFAQrwFuuSPJ37zj2tznw3QJ+IcnfjzGezJ67CvxRkkzT9HqS55L86+y8b8/O/Y/Z62uTHJPkyTHGf85eA3AAY5qmhZ4BAAAOyBVWAACqCVYAAKoJVgAAqglWAACqCVYAAKoJVgAAqglWAACqCVYAAKr9P8bb7HZA9fu3AAAAAElFTkSuQmCC",
      "text/plain": [
       "<Figure size 864x576 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "counts = count_digits(answers)\n",
    "plot_counts(counts)\n",
    "print(f\"Entropy: {entropy(counts)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f6c8a22-fdf5-4f30-865c-8e11927b1b7c",
   "metadata": {},
   "source": [
    "GPT4 returns the correct answer with certainty."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50d4a55e-86df-46ab-8b38-302c79bc8add",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "When generating text completions with a language model we typically look at one output sample, trying to find the \"right\" answer. However, doing so we obscure the diversity of answers that these language models can produce. Assuming the diversity of answers reflects these models' \"uncertainty\", we can use measures such as the entropy of the answers' distribution to evaluate the quality of the answer.\n",
    "\n",
    "Which result should we be choosing once we have different samples? There is no definite answer to this question. The [self-consistency method](https://arxiv.org/abs/2203.11171) consists in choosing the result based on a majority vote. We think this choice is arbitrary and that choosing the correct answer is a [decision theory](https://en.wikipedia.org/wiki/Decision_theory) problem, which can only be solved by specifying a loss function that is adapted to the experiment's context; the majority vote being a particular case with a 0-1 loss."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
